PL-Tree:An Efficient Access Method for High-Dimensional Data

ثبت نشده
چکیده

The quest for processing spatial data in high-dimensional space has resulted in a number of innovative indexing mechanisms. Most of the early methods index data according to their geometric relationships. While they have success to a certain degree, these methods, unfortunately, either lack efficiency, particularly at a higher dimensionality, or are too complex to implement. These drawbacks have made the geometry-based indexing methods less desirable in large-scale applications involving high-dimensional data. To conquer these problems, we introduce a new algebra-based method for indexing high-dimensional data. Our method first partitions the original data space into hypercubes, and then labels each object in a hypercube using a bijective pairing function. All objects in the same hypercube have the same label and vice versa. The bijective pairing function provides the hypothesis to map a high-dimensional vector into a scalar value. This partition and label process continues recursively, until each hypercube contains a pre-determined number of data points. From a structural point of view, our method forms a tree of labels with a number of children for each parent node which we call a PL-tree. Hypercubes in the PL-tree are indexed by labels instead of high-dimensional vectors. In this paper, we present algorithms to construct a PL-tree index and algorithms to carry out the range queries. We have also done experiments that compare the performance of PL-tree with some popular indexing methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The PL-Tree: A Fast High-Dimensional Access Method for Range Queries

The quest for processing range queries of highdimensional data has resulted in a number of innovative indexing mechanisms. Most of the early methods index data according to their geometric relationships. While they have met with certain success, these methods, unfortunately, either lack efficiency, particularly at higher dimensionality, or are too complex to implement. These drawbacks have made...

متن کامل

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

Enforcing RBAC Policies over Data Stored on Untrusted Server (Extended Version)

One of the security issues in data outsourcing is the enforcement of the data owner’s access control policies. This includes some challenges. The first challenge is preserving confidentiality of data and policies. One of the existing solutions is encrypting data before outsourcing which brings new challenges; namely, the number of keys required to access authorized resources, efficient policy u...

متن کامل

Calculation of One-dimensional Forward Modelling of Helicopter-borne Electromagnetic Data and a Sensitivity Matrix Using Fast Hankel Transforms

The helicopter-borne electromagnetic (HEM) frequency-domain exploration method is an airborne electromagnetic (AEM) technique that is widely used for vast and rough areas for resistivity imaging. The vast amount of digitized data flowing from the HEM method requires an efficient and accurate inversion algorithm. Generally, the inverse modelling of HEM data in the first step requires a precise a...

متن کامل

APG: An Efficient Software Program for Amp-Pl Thermobarometry Based on Graphical Method

Geothermobarometry equations are based on thermodynamic principles and appear in single or multi-variant functions. The number of variants for a specific composition or reaction usually is reduced into 2 involving temperature (T) and pressure (P). Since most of planned equations have two passive or variant P and T, using these equations should be with special care. It is very effective to use g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012